NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution

نویسندگان

  • Kaveh Aasaraai
  • Andreas Moshovos
چکیده

Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as Runahead and out-of-order execution that require nonblocking caches to tolerate main memory latencies. Instead, these processors use non-blocking caches to extract memory level parallelism and improve performance. However, conventional non-blocking cache designs are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work proposes NCOR, an FPGA-friendly non-blocking cache that exploits the key properties of Runahead execution. NCOR does not require CAMs and utilizes smart cache controllers. A 4KB NCOR operates at 329MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32KB NCOR operates at 278Mhz and uses 269 logic elements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanching MLP: Runahead Execution and Related Techniques

The growing memory wall1 makes speedups increasingly difficult to achieve on applications that exhibit difficult-topredict memory access patterns. The problem is that although modern processors provide multiple high-bandwidth execution units, applications that experience frequent cache misses are only executed with high IPC in the periods between misses. As main memory latencies increase from 2...

متن کامل

MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

Threads experiencing long-latency loads on a simultaneous multithreading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a longlatency load but i...

متن کامل

Citation : James Dundas and Trevor Mudge . Improving data cache performance by pre - exe

In this paper we propose and evaluate a technique that improves first level data cache performance by pre-executing future instructions under a data cache miss. We show that these preexecuted instructions can generate highly accurate data prefetches, particularly when the first level cache is small. The technique is referred to as runahead processing. The hardware required to implement runahead...

متن کامل

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

Today’s high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have already reached the point where the size of an instruction window that can handle these latencies is prohibitively large, in terms of both d...

متن کامل

Overlay Architectures for FPGA-Based Software Packet Processing

Overlay Architectures for FPGA-Based Software Packet Processing Martin Labrecque Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2011 Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Reconfig. Comp.

دوره 2012  شماره 

صفحات  -

تاریخ انتشار 2012